Whole genome sequence datasets of Salmonella enterica serovar Saintpaul ST50 and serovar Worthington ST592 strains isolated from raw milk in Brazil

Herein we report the draft genome sequences of Salmonella enterica subsp. enterica serovars Saintpaul ST50 and Worthington ST592 isolated from raw milk samples in Northeastern Brazil. The 4,696,281 bp S. Saintpaul ST50 genome contained 4,628 genes in 33 contigs, while S. Worthington ST592 genome was 4,890,415 bp in length, comprising 4,951 genes in 46 contigs. S. Worthington ST592 carried a conserved Col(pHAD28) plasmid which contains the antimicrobial resistance determinants tet(C), acc(6′)-Iaa, and a nonsynonymous point mutation in ParC (p.T57S). The data could support further evolutionary and epidemiologic studies involving Salmonella organisms.


a b s t r a c t
Herein we report the draft genome sequences of Salmonella enterica subsp.enterica serovars Saintpaul ST50 and Worthington ST592 isolated from raw milk samples in Northeastern Brazil.The 4,696,281 bp S. Saintpaul ST50 genome contained 4,628 genes in 33 contigs, while S. Worthington ST592 genome was 4,890,415 bp in length, comprising 4

Description of data collection
Pure cultures of both Salmonella enterica serovar Worthington ST592 and serovar Saintpaul ST50 strains were used for total DNA extraction using a commercial kit (QIAsymphony DSP DNA, Qiagen).Genomic libraries were prepared and sequenced in the MiSeq platform (Illumina).Assembled genomes obtained from raw reads were annotated and analyzed for in silico multilocus sequence typing, antimicrobial resistance genes, stress response genes, virulence factors, identification of Salmonella Pathogenicity Islands (SPIs), and plasmids.Data source location

Value of the Data
• Salmonella enterica is a leading foodborne pathogen causing salmonellosis, a major zoonosis affecting populations in both developed and developing regions worldwide.While there is a plethora of publicly available sequencing data of Salmonella enterica serovars originating from livestock such as pigs and poultry, those associated with milk and dairy products in Brazil are still scarce.• The present whole genomes sequencing data describe genomic-related features associated with important Salmonella serovars in agri-food systems.• The data herein reported for both isolates can provide valuable information supporting further studies on comparative genomics addressing the epidemiology and evolution of Salmonella .

Objective
The consumption of raw milk or dairy products made with raw milk, such as cheese, has been associated with foodborne illness outbreaks worldwide, highlighting the importance of hygiene practices to mitigate Salmonella in the dairy production chain.The recent advances in high throughput sequencing technologies in parallel with ever-lower costs provide the opportunity to obtain in-depth genomic information on critical pathogens to public health.In addition, such information has the potential to trigger important changes toward the implementation of highresolution monitoring and surveillance systems in the food industry.However, the success to improve monitoring and surveillance systems on a global scale depends on the availability of genomic information.Therefore, this study aimed at providing key genomic features of S. Saintpaul ST50 and S. Worthington ST592 isolated from contaminated raw milk in Northeastern Brazil.

Data Description
Here we report the whole genome sequencing data of Salmonella enterica subsp.enterica serovar Saintpaul ST50 and serovar Worthington ST592 strains, genome screening for antimicrobial resistance (AMR) and virulence factors.Additionally, multilocus sequence typing, identification of Salmonella Pathogenicity Islands (SPIs), and plasmids-related data are also described.
The S. Saintpaul genome was assigned as sequence type (ST) 50, and comprised 1325,200 reads, with 304,796,0 0 0 bases and an average of 215 bases per read.The assembly generated 33 contigs with an N 50 value of 416,750 bp and 60-fold coverage with a chromosome of 4696,281 bp comprising 4628 coding sequences (CDS), 22 rRNA genes, and 78 tRNA genes.This genome presented 52.2% G + C contents.
The S. Worthington genome was assigned as ST 592.A total of 1487,452 reads were obtained averaging 294 bases per read with 446,235,600 total bases.46 contigs were generated after assembly, with an N 50 value of 238,900 bp and 78X of the coverage, with a G + C content of 52.2%.The chromosome was 4890,415 bp in length, comprising 4951 CDS, 12 rRNA genes, and 79 tRNA genes.A graphical representation of the annotated genomes is shown in ( Fig. 1 A and B).
According to the disc-diffusion test, the S. Worthington strain was phenotypically resistant to ciprofloxacin (quinolone), gentamicin (aminoglycoside), and tetracycline, while S. Saintpaul was sensitive to all tested antimicrobials and considered pan-susceptible.Results of the antimicrobial susceptibility test of the two Salmonella strains are shown in ( Table 1 ).Downstream analyses showed that both strains harbored genes encoding resistance to aminoglycosides [ aac(6 ) -Iaa ].We also provided detailed information on the genes identified in both CARD and VFDB databases as supplementary data (Supplementary materials S1, S2, S3 and S4).The S. Worthington ST592 genome harbored one plasmid Col(pHAD28) (KU674895), whilst no plasmids were identified in the S. Saintpaul ST50 genome.

Table 1
Results of antimicrobial susceptibility testing by disc-diffusion method for both Salmonella Saintpaul ST50 and Salmonella Worthington ST592 strains.
The comparative structural analysis of both genomes with S. Typhimurium LT2 is shown in Fig. 3 .We identified the presence of the five major pathogenic islands: SPI-1 which plays a key role in the process of host cell invasion, SPI-2, 3, and 4 related to bacterial survival and growth, and SPI-5 which appears to mediate inflammation and chloride secretion [ 2 , 3 ].

Salmonella isolates
The two Salmonella enterica isolates were obtained from a cross-sectional investigation involving 197 randomly collected raw milk samples from 50 herds in Northeastern Brazil [4] .Briefly, sterile stainless-steel ladles were used to collect milk samples (100 mL), which were promptly kept on ice until processed within a 4-hour timeframe.Upon arrival at the laboratory, subsamples (25 mL) were subjected to pre-enrichment in 225 mL lactose broth (LB).Aliquots of 0.1 mL and 1 mL were subsequently transferred to 9.9 mL of Rappaport-Vassiliadis (RV) broth and 9 mL of Muller-Kauffmann Tetrathionate Broth (TT), respectively.The RV tubes were then incubated at 42 °C for 24 h, while the TT tubes were incubated at 37 °C for the same time.Thereafter, loopfuls of the enriched cultures were streaked onto xylose lysine desoxycholate agar (XLD) and Hektoen enteric agar (HE) plates.Following incubation at 37 °C for 24 h, characteristic Salmonella colonies were selected and inoculated into triple sugar iron agar (TSI) and lysine iron agar (LIA) Confirmation of Salmonella isolates was achieved by slide agglutination test employing polyvalent somatic (anti-O) and flagellar (anti-H) antisera.
BLAST Ring Image Generator (BRIG) version 3.0 [16] was used for genome comparisons.The circular genomic map was constructed with BLAST + using standard parameters.S. Typhimurium LT2 (GenBank accession number AE006468.2) was used as the reference genome.

Screening for plasmids
Plasmid sequences were predicted by means of PlasmidFinder [17] .The analyses were carried out at the Center for Genomic Epidemiology (CGE) web server using raw reads and a 90% threshold identity.

Ethics Statements
The work meets the ethical requirements for publication in Data in Brief.The work does not involve studies with animals and humans.

Fig. 1 .
Fig. 1.Circular genome maps of Salmonella Saintpaul ST50 (A) and Salmonella Worthington ST592 (B) constructed by means of the Comprehensive bacterial bioinformatics resource known as PATRIC.From the outer to the inner ringcontigs (scale -x1Mbp), coding sequence (CDS) in the direct strand, CDS in the reverse strand, RNA genes, CDS with homology to known antimicrobial resistance genes, CDS with homology to known virulence factors, GC content, and GC skew.

Fig. 2 .
Fig. 2. SNP Cluster tree generated by the NCBI's Isolates Browser tool for the Salmonella Worthington ST592 genome.The genome sequence of S .Worthington ST592 strain isolated from the raw milk in Northeastern Brazil is represented in red.

Fig. 3 .
Fig. 3. Salmonella Saintpaul ST50 and Salmonella Worthington ST592 strains constructed by means of the BLAST Ring Image Generator known as BRIG.The red circle represents the genome sequence of the S. Saintpaul ST50 strain, and the blue cycle indicates the sequence of the S. Worthington ST592 genome.The black circle indicates the GC content and green and purple GC skew.S. Typhimurium LT2 was used as the reference genome.The red arrows show positions of the Salmonella Pathogenicity Islands (SPI-1, SPI-3, and SPI-5) while the blue arrows depict the positions of the SPI-2 and SPI-4.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Statement Elma L. Leite: Writing -original draft, Data curation; Mauro M.S. Saraiva: Writing -review & editing, Methodology; Priscylla C. Vasconcelos: Methodology; Daniel F.M. Monte: Methodology; Marc W. Allard: Methodology, Writing -review & editing; Patrícia E.N. Givisiez: Writing -review & editing; Wondwossen A. Gebreyes: Writing -review & editing; Oliveiro C. Freitas Neto: Writing -review & editing; Celso J.B. Oliveira: Supervision, Project administration, Writing -review & editing.